Recovery of Memory and Process in DSM

نویسنده

  • Zheng Zhang
چکیده

multiprocessor, shared memory, high availability In this report, we discuss the recovery of memory and processes on the platform of a shared-memory DSM system. We divide the problem into recovery of unaffected memory (RUM), and recovery of affected processes (RAP). We point out that specially designed faulttolerant, non-volatile memory is neither sufficient nor necessary to solve the problem of RUM. It is not sufficient that the system can go down when one node goes away, which can be a result of many types of faults: power failure is but one of them. It is not necessary either, because the system is distributed in nature; information redundancy across fault units can be realized, therefore, without using special memory. We discuss several ways of implementing a fault-tolerant memory system using plain memory by modifying the write-back protocols in DSM systems. The proposed techniques include mirroring and RAIM, which stands for Redundant Array of Independent Memory. The fault-tolerant memory system lays the foundation for other HA solutions, in addition to attack the problem of RUM. We use a novel approach to survey the space of transparent rollback recovery alternatives as our means to target RAP. There are two axes that constitute our space. The first axis is the fraction of fault-tolerant memory system which is part of the reliable storage. This, in many ways determines the cost of the system as well as the checkpoint bandwidth. The second axis is how and when the checkpoint image is established and committed. The three options, built-on-the-fly, stop-and-forward and copy-on-write, have different system complexity and performance implications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coordinated Checkpointing-Rollback Error Recovery for Distributed Shared Memory Multicomputers

Most recovery schemes that have been proposed for Distributed Shared Memory (DSM) systems require unnecessarily high checkpointing frequency and checkpoint traffic, which are sensitive to the frequency of interprocess communication in the applications. For message-passing systems, low overhead error recovery based on coordinated checkpointing allows the frequency of checkpointing to be determin...

متن کامل

Effect of Electrical Current on Nitinol Medical Staples Shape Memory

Medical staples are one of the nitinol shape memory alloys applications which their pedicles have to be bent for suitable function with an appropriate angle at a specified temperature .It is achievable by shape memory effect. For this purpose, samples of nitinol super elastic alloys were bent in a steel mold with different angles & formed orthopedic staples. Then shape memory effect was induced...

متن کامل

Effect of Electrical Current on Nitinol Medical Staples Shape Memory

Medical staples are one of the nitinol shape memory alloys applications which their pedicles have to be bent for suitable function with an appropriate angle at a specified temperature .It is achievable by shape memory effect. For this purpose, samples of nitinol super elastic alloys were bent in a steel mold with different angles & formed orthopedic staples. Then shape memory effect was induced...

متن کامل

Trend of Memory Recovery after Benzodiazepine Overdose

Because of difference in the toxicokinetic and pharmacokinetic characteristics of drugs and with respect to the high rate of benzodiazepine abuse in suicidal attempts, the trend of possible anterograde amnesia after recovery from benzodiazepine overdose were evaluated in Iranian patients. This was a prospective descriptive/analytic clinical trial, which was conducted, in Noor general teaching h...

متن کامل

Checkpointing and recovery in a transaction-based DSM operating system

Reliability of cluster systems can be improved by periodically saving checkpoints in stable storage. In case of an error a backward error recovery can restart the cluster from the last checkpoint and thus avoiding a fallback to the initial state. Different strategies originally developed for message-passing systems have been adapted for Distributed Shared Memory (DSM) systems. However, it is no...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001